Search CORE

608 research outputs found

Principal Boundary on Riemannian Manifolds

Author: Yao Zhigang
Zhang Zhenyue
Publication venue
Publication date: 30/03/2019
Field of study

We consider the classification problem and focus on nonlinear methods for classification on manifolds. For multivariate datasets lying on an embedded nonlinear Riemannian manifold within the higher-dimensional ambient space, we aim to acquire a classification boundary for the classes with labels, using the intrinsic metric on the manifolds. Motivated by finding an optimal boundary between the two classes, we invent a novel approach -- the principal boundary. From the perspective of classification, the principal boundary is defined as an optimal curve that moves in between the principal flows traced out from two classes of data, and at any point on the boundary, it maximizes the margin between the two classes. We estimate the boundary in quality with its direction, supervised by the two principal flows. We show that the principal boundary yields the usual decision boundary found by the support vector machine in the sense that locally, the two boundaries coincide. Some optimality and convergence properties of the random principal boundary and its population counterpart are also shown. We illustrate how to find, use and interpret the principal boundary with an application in real data.Comment: 31 pages,10 figure

arXiv.org e-Print Archive

ScholarBank@NUS

FigShare

Principal Sub-manifolds

Author: Pham Tung
Yao Zhigang
Publication venue
Publication date: 21/09/2016
Field of study

We revisit the problem of finding principal components to the multivariate datasets, that lie on an embedded nonlinear Riemannian manifold within the higher-dimensional space. Our aim is to extend the geometric interpretation of PCA, while being able to capture the non-geodesic form of variation in the data. We introduce the concept of a principal sub-manifold, a manifold passing through the center of the data, and at any point on the manifold, it moves in the direction of the highest curvature in the space spanned by eigenvectors of the local tangent space PCA. Compared to the recent work in the case where the sub-manifold is of dimension one (Panaretos, Pham and Yao 2014)--essentially a curve lying on the manifold attempting to capture the one-dimensional variation--the current setting is much more general. The principal sub-manifold is therefore an extension of the principal flow, accommodating to capture the higher dimensional variation in the data. We show the principal sub-manifold yields the usual principal components in Euclidean space. By means of examples, we illustrate how to find, use and interpret principal sub-manifold with an extension of using it in shape analysis

arXiv.org e-Print Archive

A statistical approach to the inverse problem in magnetoencephalography

Author: Eddy William F.
Yao Zhigang
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 31/07/2014
Field of study

Magnetoencephalography (MEG) is an imaging technique used to measure the magnetic field outside the human head produced by the electrical activity inside the brain. The MEG inverse problem, identifying the location of the electrical sources from the magnetic signal measurements, is ill-posed, that is, there are an infinite number of mathematically correct solutions. Common source localization methods assume the source does not vary with time and do not provide estimates of the variability of the fitted model. Here, we reformulate the MEG inverse problem by considering time-varying locations for the sources and their electrical moments and we model their time evolution using a state space model. Based on our predictive model, we investigate the inverse problem by finding the posterior source distribution given the multiple channels of observations at each time rather than fitting fixed source parameters. Our new model is more realistic than common models and allows us to estimate the variation of the strength, orientation and position. We propose two new Monte Carlo methods based on sequential importance sampling. Unlike the usual MCMC sampling scheme, our new methods work in this situation without needing to tune a high-dimensional transition kernel which has a very high cost. The dimensionality of the unknown parameters is extremely large and the size of the data is even larger. We use Parallel Virtual Machine (PVM) to speed up the computation.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS716 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Optimal classification in sparse Gaussian graphic model

Author: Fan Yingying
Jin Jiashun
Yao Zhigang
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 20/11/2013
Field of study

Consider a two-class classification problem where the number of features is much larger than the sample size. The features are masked by Gaussian noise with mean zero and covariance matrix

\Sigma

, where the precision matrix

\Omega=\Sigma^{-1}

is unknown but is presumably sparse. The useful features, also unknown, are sparse and each contributes weakly (i.e., rare and weak) to the classification decision. By obtaining a reasonably good estimate of

\Omega

, we formulate the setting as a linear regression model. We propose a two-stage classification method where we first select features by the method of Innovated Thresholding (IT), and then use the retained features and Fisher's LDA for classification. In this approach, a crucial problem is how to set the threshold of IT. We approach this problem by adapting the recent innovation of Higher Criticism Thresholding (HCT). We find that when useful features are rare and weak, the limiting behavior of HCT is essentially just as good as the limiting behavior of ideal threshold, the threshold one would choose if the underlying distribution of the signals is known (if only). Somewhat surprisingly, when

\Omega

is sufficiently sparse, its off-diagonal coordinates usually do not have a major influence over the classification decision. Compared to recent work in the case where

\Omega

is the identity matrix [Proc. Natl. Acad. Sci. USA 105 (2008) 14790-14795; Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 367 (2009) 4449-4470], the current setting is much more general, which needs a new approach and much more sophisticated analysis. One key component of the analysis is the intimate relationship between HCT and Fisher's separation. Another key component is the tight large-deviation bounds for empirical processes for data with unconventional correlation structures, where graph theory on vertex coloring plays an important role.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1163 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Fixed Boundary Flows

Author: Fan Zengyan
Xia Yuqing
Yao Zhigang
Publication venue
Publication date: 24/04/2019
Field of study

We consider the fixed boundary flow with canonical interpretability as principal components extended on the non-linear Riemannian manifolds. We aim to find a flow with fixed starting and ending point for multivariate datasets lying on an embedded non-linear Riemannian manifold, differing from the principal flow that starts from the center of the data cloud. Both points are given in advance, using the intrinsic metric on the manifolds. From the perspective of geometry, the fixed boundary flow is defined as an optimal curve that moves in the data cloud. At any point on the flow, it maximizes the inner product of the vector field, which is calculated locally, and the tangent vector of the flow. We call the new flow the fixed boundary flow. The rigorous definition is given by means of an Euler-Lagrange problem, and its solution is reduced to that of a Differential Algebraic Equation (DAE). A high level algorithm is created to numerically compute the fixed boundary. We show that the fixed boundary flow yields a concatenate of three segments, one of which coincides with the usual principal flow when the manifold is reduced to the Euclidean space. We illustrate how the fixed boundary flow can be used and interpreted, and its application in real data

arXiv.org e-Print Archive

Manifold Fitting under Unbounded Noise

Author: Xia Yuqing
Yao Zhigang
Publication venue
Publication date: 23/09/2019
Field of study

There has been an emerging trend in non-Euclidean dimension reduction of aiming to recover a low dimensional structure, namely a manifold, underlying the high dimensional data. Recovering the manifold requires the noise to be of certain concentration. Existing methods address this problem by constructing an output manifold based on the tangent space estimation at each sample point. Although theoretical convergence for these methods is guaranteed, either the samples are noiseless or the noise is bounded. However, if the noise is unbounded, which is a common scenario, the tangent space estimation of the noisy samples will be blurred, thereby breaking the manifold fitting. In this paper, we introduce a new manifold-fitting method, by which the output manifold is constructed by directly estimating the tangent spaces at the projected points on the underlying manifold, rather than at the sample points, to decrease the error caused by the noise. Our new method provides theoretical convergence, in terms of the upper bound on the Hausdorff distance between the output and underlying manifold and the lower bound on the reach of the output manifold, when the noise is unbounded. Numerical simulations are provided to validate our theoretical findings and demonstrate the advantages of our method over other relevant methods. Finally, our method is applied to real data examples

arXiv.org e-Print Archive

Estimation of Ridge Using Nonlinear Transformation on Density Function

Author: Chen Hengchao
Yao Zhigang
Zhai Zheng
Publication venue
Publication date: 09/06/2023
Field of study

Ridges play a vital role in accurately approximating the underlying structure of manifolds. In this paper, we explore the ridge's variation by applying a concave nonlinear transformation to the density function. Through the derivation of the Hessian matrix, we observe that nonlinear transformations yield a rank-one modification of the Hessian matrix. Leveraging the variational properties of eigenvalue problems, we establish a partial order inclusion relationship among the corresponding ridges. We intuitively discover that the transformation can lead to improved estimation of the tangent space via rank-one modification of the Hessian matrix. To validate our theories, we conduct extensive numerical experiments on synthetic and real-world datasets that demonstrate the superiority of the ridges obtained from our transformed approach in approximating the underlying truth manifold compared to other manifold fitting algorithms

arXiv.org e-Print Archive